Customer Segmentation

Customer Segmentation is one of key components of any business. It helps the business in product testing, targeting a specific group and to make other such important decisions.

The data set refers to clients of a wholesale distributor. It includes the annual spending in monetary units (m.u.) on diverse product categories.

The data has been taken from Kaggle. It has the following features:

In this Assignment

Step 1: Understand the dataset

Step 2: Exploratory Data Analysis

Step 3: Principal Component Analysis

Step 4: Kernel Principal Component Analysis

Step 5: K-Means Clustering with Elbow Method

Step 6: Interactive Cluster Analysis

EDA

In this section we will see the relationship among different features and also the data distribution within each.

Scaling the data

PCA

Reducing the dimensionality of the data using Principal Component Analysis. In this section we will see how the performance varies from PCA to Kernel PCA and also among the different kernel types: Poly, Cosine and RBF

Kernel PCA

Let's see whether Kernel PCA performs better

Poly Function

Radial Basis Function

Cosine Function

K-Means Algorithm

Elbow Method

We will verify our assumption of k=5 using the elbow method

Interactive Cluster Analaysis

Conclusions:

We see that the k value of 5 yeilded the optimal solution and was also verified the elbow method. The 5 different clusters of customers with different preferences. We see that customers of cluster 0 behaves identically with customers of cluster 1. The business can focus on the customers of cluster 0 to increase their volume and expand the business.

Next Steps:

We can try other clustering algorithms like DBScan and Hierarchical clustering and compare the results.

Data Scource: https://www.kaggle.com/sahistapatel96/wholesale-customer-datacsv